DE eng

Search in the Catalogues and Directories

Hits 1 – 4 of 4

1
Cross-lingual few-shot hate speech and offensive language detection using meta learning
In: ISSN: 2169-3536 ; EISSN: 2169-3536 ; IEEE Access ; https://hal.archives-ouvertes.fr/hal-03559484 ; IEEE Access, IEEE, 2022, 10, pp.14880-14896. ⟨10.1109/ACCESS.2022.3147588⟩ (2022)
Abstract: International audience ; Automatic detection of abusive online content such as hate speech, offensive language, threats, etc. has become prevalent in social media, with multiple efforts dedicated to detecting this phenomenon in English. However, detecting hatred and abuse in low-resource languages is a non-trivial challenge. The lack of sufficient labeled data in low-resource languages and inconsistent generalization ability of transformer-based multilingual pre-trained language models for typologically diverse languages make these models inefficient in some cases. We propose a meta learning-based approach to study the problem of few-shot hate speech and offensive language detection in low-resource languages that will allow hateful or offensive content to be predicted by only observing a few labeled data items in a specific target language. We investigate the feasibility of applying a meta learning approach in cross-lingual few-shot hate speech detection by leveraging two meta learning models based on optimization-based and metric-based (MAML and Proto-MAML) methods. To the best of our knowledge, this is the first effort of this kind. To evaluate the performance of our approach, we consider hate speech and offensive language detection as two separate tasks and make two diverse collections of different publicly available datasets comprising 15 datasets across 8 languages for hate speech and 6 datasets across 6 languages for offensive language. Our experiments show that meta learning-based models outperform transfer learning-based models in a majority of cases, and that Proto-MAML is the best performing model, as it can quickly generalize and adapt to new languages with only a few labeled data points (generally, 16 samples per class yields an effective performance) to identify hateful or offensive content.
Keyword: [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]; [INFO.INFO-NI]Computer Science [cs]/Networking and Internet Architecture [cs.NI]; [INFO.INFO-SI]Computer Science [cs]/Social and Information Networks [cs.SI]; Cross-lingual classification; Few-shot learning; Hate speech; Meta learning; Offensive language; Transfer learning; XLMRoBERTa
URL: https://doi.org/10.1109/ACCESS.2022.3147588
https://hal.archives-ouvertes.fr/hal-03559484
BASE
Hide details
2
Dataset of coronavirus content from Instagram with an exploratory analysis
In: ISSN: 2169-3536 ; EISSN: 2169-3536 ; IEEE Access ; https://hal.archives-ouvertes.fr/hal-03559489 ; IEEE Access, IEEE, 2021, 9, pp.157192-157202. ⟨10.1109/ACCESS.2021.3126552⟩ (2021)
BASE
Show details
3
Transfer Learning for Multi-lingual Tasks -- a Survey ...
BASE
Show details
4
A First Instagram Dataset on COVID-19 ...
BASE
Show details

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
4
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern